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ABSTRACT 

This is the second paper in our completeness series which addresses some of the issues raised 
in the previous article by Johnston et al. (2007) in which we developed statistical tests for 
assessing the completeness in apparent magnitude of magnitude-redshift surveys defined by 
two flux limits. The statistics, Tc and T.^,, associated with these tests are non-parametric and 
defined in terms of the observed cumulative distribution function of sources; they represent 
powerful tools for identifying the true flux limit and/or characterising systematic errors in 
magnitude-redshift data. 

In this paper we present a new approach to constructing these estimators that resembles an 
"adaptive smoothing" procedure - i.e. by seeking to maintain the same amount the informa- 
tion, as measured by the signal-to-noise ratio, allocated to each galaxy. For consistency with 
our previous work, we apply our improved estimators to the Millennium Galaxy Catalogue 
(MGC) and the Two Degree Field Galaxy Redshift Survey (2dFGRS) data, and demonstrate 
that one needs to use a s/n appropriately tailored for each individual catalogue to optimise the 
performance of the completeness estimators. Furthermore, unless such an adaptive procedure 
is employed, the assessment of completeness may result in a spurious outcome if one uses 
other estimators present in the literature which have not been designed taking into account 
"shot noise" due to sampling. 

Key words: Cosmology: methods: data analysis - methods: statistical - astronomical bases: 
miscellaneous - galaxies: redshift surveys - galaxies: large-scale structure of Universe. 



1 INTRODUCTION 

In recent years the statistical analysis of galaxy redshift surveys has 
played a central role in cosmology, yielding stringent constraints on 
the parameters of both the underlying cosmological world model 
and on the clustering properties of galaxies as a function of red- 
shift, environment and morphological type. However, both tasks are 
hampered by observational selection effects - due to e.g. detection 
limits in apparent magnitude, colour, surface brightness or some 
combination thereof. A wide range of statistical tools has been de- 
veloped to identify, characterise - and hopefully to remove - the 
impact of observational selection effects from magnitude-redshift 
surveys. Presently, we have the initial data release from the Wig- 
gleZ Dark Energy Survey (Drinkwater 2010), which will attempt to 
measure the baryon acoustic oscillation (BAG) scale to within 2% 
from 240,000 emission line galaxies. There also has also been the 
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zCOSMOS survey (Lilly 2009; Zucca et al. 2009) that is exploring 
galaxy evolution through the role of environment at high redshift in 
the range 1.5 < 2 < 3.0. To achieve such high precision in these 
measurements will require accurate understanding of the selection 
and, particularly with zCOSMOS, luminosity functions. 

To fully understand the statistical properties of the aforemen- 
tioned selection function it is crucial that we understand the role 
of completeness in apparent magnitude - meaning that all galax- 
ies brighter than some specified limiting apparent magnitude (or, 
as is pertinent to this paper, with apparent magnitudes lying be- 
tween some specified bright and faint limiting values) have been 
observed. A classical test for completeness in apparent magnitude 
is to analyse the variation in galaxy number counts as a function 
of the adopted limiting apparent magnitude (Hubble 1926). This 
test, which presupposes that the galaxy population does not evolve 
with time and is homogeneously distributed in space, is however 
not very efficient. More specifically, it is difficult to decide in prac- 
tice whether deviations from the expected galaxy number count are 
indeed an effect of incompleteness in apparent magnitude, or are 
instead due to galaxy clustering and/or evolution of the galaxy lu- 
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minosity function - or indeed created by incomplete sampling in 
apparent magnitude. Of course in designing a completeness test one 
can also make use of distance information via galaxy redshifts; the 
still widely used and well-known IZ/Knax test of Schmidt (1968) 
does this, and considers - for a specified magnitude limit - the 
ratio of two volumes: the volume of a sphere of radius equal to 
the actual distance of observed galaxy, divided by the volume of 
a sphere of radius equal to the maximum distance at which the 
galaxy would be observable - i.e. at the apparent magnitude limit. 
It follows that - for a non-evolving, homogeneous distribution of 
galaxies - the expected value and variance of V^/Knax are equal 
to 1/2 and 1/12 respectively. The V/Vmax test has been used to 
assess the completeness of magnitude-redshift samples (see for ex- 
ample Hudson and Lynden-Bell 1991 ), but unfortunately it suffers 
from the same major drawbacks as the Hubble test based on galaxy 
number counts: it is difficult to interpret whether any significant 
measured departure from the expected value of V/V-max is due to 
incompleteness or to clustering and evolutionary effects. 

In a seminal paper, Efron and Petrosian (1992) (here- 
after EP92) introduced a powerful new approach to analysing 
magnitude-redshift surveys that drew on concepts developed in the 
so-called C-method of Lynden-Bell (1971) for constructing galaxy 
LPs. EP92 proposed a non-parametric permutation test for the inde- 
pendence of the spatial and luminosity distributions of galaxies in 
a magnitude-limited sample, which required no assumptions con- 
cerning the parametric form of both the spatial distribution and the 
galaxy luminosity function. They applied this test to a quasar sam- 
ple, with an assumed apparent magnitude limit, in order to robustly 
estimate the parameters characterising the luminosity distance- 
redshift relation of the quasars (see also Efron and Petrosian 1999). 

Rauzy (2001) (hereafter ROl) noted that the essential ideas 
of EP92 could be straightforwardly adapted and extended to turn 
their non-parametric test of the cosmological model into a non- 
parametric test of the assumption of a magnitude-limited sample - 
thus developing a simple but powerful tool for assessing the magni- 
tude completeness of magnitude-redshift surveys. As was the case 
with EP - and unlike the Hubble number counts or V/ Vmax tests - 
the Rauzy test statistic, T^, requires no assumption about the spatial 
homogeneity of the galaxy distribution. Moreover, it also requires 
no knowledge of the parametric form of the galaxy luminosity func- 
tion. On the other hand, the Rauzy test was formulated only for the 
case of a sharp, faint apparent magnitude limit. 

Johnston et al. (2007) (hereafter JTH07) discussed the advan- 
tages of the Tc statistic over standard completeness tests and ex- 
tended its use to data that is characterised by both a faint and bright 
magnitude limit. Moreover, they introduced a new variant statis- 
tic, called Tv, constructed using the sampled cumulative distance 
modulus, Z, distribution that retains similar properties to those 
of Tc i.e. being independent of the spatial distribution of galax- 
ies. By sampling the data in this way, the Ti, statistic amounted 
to a much improved differential version of the the widely used 
T^/'Knax test (which assumes spatial homogeneity). JTH07 applied 
their completeness test to three major redshift surveys: the Millen- 
nium Galaxy Catalogue (MGC)(e.g. Liske et al. 2003; Cross et al. 
2004), the Two Degree Field Galaxy Redshift Survey (2dFGRS) 
(e.g. CoUess 2001), and a Sloan Digital Sky Survey - Early Types 
(SDSS-ET) (e.g. Bernardi 2003) sample. They concluded that all 
three surveys were complete in apparent magnitude up to their re- 
spective published magnitude limits. In the case of the 2dFGRS 
survey data, however, they showed that one is first required to adopt 
a secondary bright apparent magnitude limit - i.e. applying the 
JTH07 generalisation. 



Application of the JTH07 generalised completeness test to 
these three surveys led us to consider two crucial effects that, if 
not accounted for correctly, could lead to wrong statistical conclu- 
sions concerning determination of the true completeness limits. In 
rough terms, the basic construction of the Tc and Tv statistics pro- 
ceeds by identifying volume-limited subsamples associated with 
each individual galaxy in the catalogue. In the design of the orig- 
nal Rauzy completeness test, where one is only concerned with the 
faint apparent magnitude limit, these volume-limited subsamples 
were uniquely defined and thus could be allowed to grow such that 
a maximised sampling of the data was achieved. With the introduc- 
tion of a secondary bright limit (as shown in Figure 1 ) the size of 
each volume-limited subsample is no longer unique. This leads to 
the obvious question: how should one optimally define each sub- 
sample? 

In studying the distribution of galaxies in the (M, Z) plane 
we are seeking to understand the underlying luminosity function of 
a given population of galaxies, as well as the manner in which that 
function is sampled. To do so we are, of course, inevitably limited to 
inferences drawn from a finite number of galaxies. This makes the 
inference process in principle susceptible to shot-noise and thus, 
if our estimators are constructed from subsamples which are too 
sparsely populated, might lead to spurious results concerning the 
global properties of the data-set. In this paper we therefore propose 
to optimise and extend our current methodology by invoking a well- 
established and objective criterion: we construct our completeness 
estimators so as to maximize their local signal-to-noise ratio. 

The format of this paper will be as follows. In § 2 we revisit the 
main points underpinning the construction of the JTH07 Tc and T^ 
statistics. In § 3 we then explore the adverse consequences that can 
arise if the JTH07 method is applied without properly accounting 
for the impact of sparse sampling. For this exploration we use the 
Millennium Galaxy Catalogue (MGC) and the Two Degree Field 
Galaxy Redshift Surveys (2dFGRS), as already studied in JTH07, 
for purely illustrative purposes. This then leads us, in § 4, to pro- 
pose an optimisation technique that is the first step towards circum- 
venting these issues. In § 5 we introduce as a sampling threshold 
a direct measurement of the signal-to-noise {s/n) of our sampling 
technique, and demonstrate how this can be implemented. In § 6 
we then discuss our conclusions and future work. 



2 THE 'SEPARABILITY' ASSUMPTION AND 
STATISTICAL FRAMEWORK 

We recall that the fundamental assumption of our method - also 
referred to as 'separability' - is that the luminosity function of the 
galaxy distribution is not dependent on the the three-dimensional 
redshift space positions z = (2, /, b) of the galaxies, where {I, b) are 
galactic directional coordinates. Although this is a rather restrictive 
assumption it underlies most of the traditional completeness tests 
in the literature. The corrected distance modulus Z is defined as. 



m~ M = fi{z) + fccoi-r + ecoiT + Ag{l, b), 



(1) 



where fccon and econ are the fc-correction and evolutionary correc- 
tion, respectively, ij.{z) is the distance modulus at redshift z and 
Ag{l,b) is an extinction correction dependent on galactic coordi- 
nates. For simplicity we are marginalising over the galactic direc- 
tional coordinates. 

In assuming separability the joint probability density in ab- 
solute magnitude and corrected distance modulus can therefore be 
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Figure 1. Diagram illustrating the construction of the rectangular regions Si, 52 (left), and S3, S4 (right) which are defined for the random variables, C,i and 
Ti respectively, for a typical galaxy at (Af^, Zi). The left hand panel shows the construction of the regions Si and S2 with the inclusion of bright and faint 
limits "jL ^"'' "^lim' respectively. These regions are uniquely defined for a 'slice' of specified width, &Z, in distance modulus, and a 'trial' faint limit m' . 
The right hand panel illustrates the construction of the rectangular regions S3 and S4 for Ti . These regions are uniquely defined for a 'slice' of specified width, 
i5M, in absolute magnitude and a 'trial' faint limit m^^ . 



written as. 



dP = \h(Z) dZ] [f{M) dM] e (m[i. 



.)e{ 



in) o [m — m. 



J 



(2) 

where f{M) and h{Z) are the probability density function of M 
and Z, respectively, and is the Heaviside or 'step' function de- 
fined as. 



e{x) 



1 if X > 0, 
if X <0. 



(3) 



Thus for each object i present in a catalogue we define the random 
variables (^i and Ti for the statistics Tc and Tv respectively^ (for a 
detailed discussion see JTH07), 







and 



F{M,) - F[M^,^iZi - 6Z)] 

F[A4^{Z,)] ~ F[Mt,^{Z, ~ 5Z)] 

njSi) _ r, 
n{SiUS2) ri, + 1' 

H{Z,)-H[ZlUM^-SM)] 
H[ZL^{M,)] - H[Zli^^{M, - SM)] 
n{S3) _ Qi 



(4) 



(5) 



n(5'3US4) U + 1' 
where r^ denotes the number of galaxies belonging to region Si , 

^ Briefly, Tc and T„ are defined as 

'^' C.-1/2 "t^ r, -1/2 

respectively. 
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Tfii the number of galaxies belonging to S\ U S2, qi the number of 
galaxies belonging to Ss, and ti the number of galaxies belonging 
to 53 U 54. Figure I illustrates the construction of the rectangular 
regions 5i, 5*2, S3 and S4 as well as the meaning and definition of 
the slices in magnitude, 5Z, and distance modulus, SM. It should 
be mentioned that r^ was also the notation used in EP92 to denote 
the rank of the object i when galaxies are sorted by magnitude. 

Essentially, the key to the JTH07 extension lay in the introduc- 
tion of these fixed 'slice' widths 5Z for C,i and 5M for Ti. Fixing 
these widths to a predetermined value allows the construction of 
unique, separable regions in Equations 4 and 5 within any doubly 
truncated survey i.e. for a survey with well defined bright and faint 
apparent magnitude limits. 

However, the choice of the 5Z and 5M widths is essentially 
arbitrary, and one might wish to consider applying different 'trial' 
widths depending on the properties of the data set under study. 
JTH07 briefly discussed this point, and noted that by varying the 
widths in this manner two distinct effects for the determination of 
the true m\^„^ were revealed: 

• For very small values of 5Z and 5M the respective Tc and 
T„ statistics will be dominated by what we may term 'shot-noise' 
(since the rectangular regions they identify are extremely sparsely 
sampled); this makes the process of drawing significant conclusions 
regarding nature of the true faint apparent magnitude limit impos- 
sible. 

• Conversely, when the values of 5Z and 5M are taken to be 
very large, then for data-sets that are not well described by a sharp 
niiim one appears to observe a range of possible values for the true 
faint magnitude limit. 

We will illustrate in more detail the manifestation of these two ef- 
fects in the following section. 



4 Teodoro, Johnston & Hendry 



3 CONSEQUENCES OF SPARSE SAMPLING 

For continuity (and illustrative purposes) we revisit the Millennium 
Galaxy Catalogue (MGC), the Two Degree Field Galaxy Redshift 
Survey (2dFGRS) and the Sload Digital Sky Survey Early Types 
(SDSS-ET) samples as used previously in JTH07. Please refer to 
this paper for survey description and sample selection. 



3.1 'Shot-noise' dominated sampling 

In this section we examine more closely the consequences of sparse 
sampling issues in the construction of the random variables, C,i and 
Ti, for the statistics Tc and T-a respectively. 

In Figure 2 we have applied the JTH07 T^ and T„ estimators 
to the SDSS-ET (upper panel), MGC (middle panel) and 2dFGRS 
(lower panel) for selected values of 5Z and 5M. (Both 5Z and 
5 Ad are defined in Figure 1). For the SDSS-ET we observe that 
the Tc and Tj, curves corresponding to respective widths of 5Z and 
5M = 0.001 and 0.01 fluctuate within the |3ct| limits for each m, 
between the survey limits of 14.5 < miim < 17.45 (as one would 
expect for a complete sample, following EP92, ROl and JTH07). 
However, contrary to the expectations of those earlier papers, as 
m, moves beyond the published faint limit of the survey, the Tc 
and Tv curves drop slightly and then flatten (or 'flat-line') inside 
—3(7 < Tc,Tv < 3a regions, instead of dropping sharply below 
the —3a level. Similar results are seen with MGC at 5Z and 5M = 
0.01 and 2dF up to SZ and 5M ~ 0.02. As we move to increasingly 
larger values of SZ and 5AI, as shown in Figure 2, the Tc and Ty 
curves continue to 'flat-line' beyond the magnitude limit, but now 
do so at a value of the statistic which lies increasingly below — 3cr. 

This so-called 'flat-lining' effect can essentially be used as a 
means of identifying the 'shot-noise' level for a given width of SZ 
and SM - i.e. the value of SZ and SM less than which the sampling 
becomes too sparse to allow the magnitude limit to be reliably es- 
timated. Understanding precisely why this 'flat-lining' happens be- 
comes quite straightforward when one considers carefully what are 
the contributing factors: the number of objects in the catalogue and 
the range in apparent magnitude of the survey. The effect is illus- 
trated in detail in Figure 3. The left panel shows the now familiar 
A4-Z distribution with the red diagonal lines representing the faint 
apparent magnitude limit mfi„, and our adopted bright limit my^,. 
The main feature of this plot is the narrow red, blue and green 
rectangles which actually delineate the Tc regions Si and 5*2 for 
a galaxy at {M„ Z,) with SZ = 0.001, 0.008 and 0.02 respectively. 
(Here we are considering a trial m» equal to the survey limit i.e. 
"^fim = 19.45 mag.). Since these rectangular 'strips' represent such 
a tiny fraction of the M-Z distribution they can barely be separated 
in the main diagram. The left panel, therefore, also shows a close- 
up of this particular region, where the distinctive coloured areas are 
now clearly defined. (Note that, because of the very narrow range 
of distance moduli considered in this close-up, the apparent mag- 
nitude limit appears essentially as a vertical line). The right panel 
of Figure 3 represents, for the same galaxy at (A/i, Zi), the equiva- 
lent Ti, construction with SM = 0.001, 0.008 and 0.02 respectively 
- with again the different coloured regions also shown in extreme 
close-up. 

What is immediately apparent for both the Tc and Ti, statistics 
is the very small number of galaxies that populate the rectangular 
regions for these small values of SZ and SAL In particular, it is 
clear that as ?n* is increased beyond the true value ?ny,^, no further 
galaxies will be added to the subsets 5*2 (for Tc) and S4 (for Tv). 
By considering Equations 4 and 5, it then follows that the Tc and 



Ty statistics will remain constant for larger values of m» - which 
explains the 'flat-lining' effect seen in Figure 2. 

The pattern which was apparent in Figure 2, whereby the 'flat- 
lining' effect occurred at progressively lower values of Tc and Tv 
as the widths of SZ and SAI were increased, can be extended to the 
limiting case that corresponds to the original Rauzy (ROl) com- 
pleteness test - where the absence of a bright apparent magnitude 
limit means that there is in principle no limit to the height of the 
constructed regions. However, since we are dealing with a flux- 
limited catalogue that contains a finite number of galaxies, we can 
expect that ultimately the 'flat-lining' effect will become apparent 
for the ROl completeness test too, if we consider a sufficiently faint 
trial value of m, . This effect is indeed seen in Figure 4, albeit for 
a value of Tc and Tv that lies enormously below the characteristic 
3 — CT level which one might choose to identify as the value of the 
statistic indicating the true apparent magnitude limit. 

In summary, then, we can understand the 'flat-lining' effect as 
a direct consequence of the very sparse sampling which occurs for 
small values of SZ and SAL A suitable choice for the width of SZ 
and SAI can then be taken to be the values for which the onset of 
the 'flat-lining' effect only occurs when the test statistics Tc and Tv 
have already dropped to 3 — a below their expected value, when 
the trial apparent magnitude limit is equal to the true value m\i^. 



3.2 Variation in mum 

We now briefly consider the apparent variation in the value of miim 
determined, resulting from the adoption of larger values of SZ and 
SAI for a survey that is doubly truncated by a bright and faint mag- 
nitude limit. If we consider once again Figure 2 we can see that for 
both SDSS-ET and MGC as we move to larger values of SZ and 
SAI our ability to determine correctly the true completeness limit 
of the survey is unaffected. However, we can quite clearly see in 
the case of the 2dFGRS on the lower panel that, as SZ and SAI in- 
creases to, and beyond, the point at which the test statistics system- 
atically fall below their — 3cr level (which we adopt to indicate the 
true apparent magnitude limit), the value of m\i^ also varies with 
the values of SZ and SAI adopted. In the range of values we are 
considering in this example, 0.002 < SZ, SAI < 0.5 we actually 
observe a corresponding range of mum from 19.0 < miim ^ 19.4. 
This variation in the 'true' magnitude limit infeiTed for a sur- 
vey is rather unsatisfactory, and would somewhat defeat the pur- 
pose of the original Rauzy completeness test: to provide a robust, 
non-parametric and objective method for independently validating 
the magnitude completeness of a given survey. It underlines the 
importance of optimising the performance of our test statistics - an 
issue which we consider in more detail in the following sections. 



4 EXPRESSIONS FOR THE SIGNAL-TO-NOISE OF OUR 
ESTIMATORS 

In this section we now consider how the estimators (i and Ti are 
constructed, and in particular how they will be affected by random 
sampling fluctuations, in order to gain insight on how they might be 
optimised. This will essentially involve computing a measure of the 
s/n on the sampled (i and n, and how those variables are affected 
by fluctuations in the number of galaxies sampled in the regions Si 
to S4. For the moment let us consider ("i only. If we assume that 
the survey galaxies are sampled according to a Poisson distribution 
then we can derive an expression for the Poisson (or shot) noise 
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Figure 2. Tc and T„ Results for the SDSS-ET (upper panels), MGC (middle panels) and 2dF (lower panels) applying the JTH07 method for varying values 
of 5Z and 5M, where we can observe the transition between 'shot-noise' dominated sampling and signal dominated sampling. We can define this transition 
to occur at the point where the width of 5Z or 5M, for Tc and T„ respectively, is sufficiently large that the appropriate statistic drops to the —3cr level at the 
faint magnitude limit of the survey. This occurs at values of 5Z and 5AI > 0.01 for SDSS, > 0.07 for MGC and > 0.02 for 2dF. 
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Figure 3. Schematic illustrating the cause of the 'flat-lining' effect (within the t3cr| confidence limits) observed for small values of 5Z and 5M. The left hand 
panel shows the (A/, Z) distribution for the 2dFGRS with the faint apparent magnitude limit mjju, and our adopted bright limit, m^^^^ indicated as red diagonal 
lines. The left hand plot considers the Q construction for a galaxy at {Mi, Zi), with SZ = 0.001, 0.008 and 0.02. The right hand plot zooms in to allow us to 
see the three distinct regions created as the size of SZ, and the relative number of galaxies contained therein, increases. Similarly, the right hand panel shows 
the corresponding n construction for a galaxy at (Mi, Zi), for SM = 0.001, 0.008 and 0.02. 



© 2010 RAS, MNRAS 000, 1-9 



6 Teodoro, Johnston & Hendry 



> 
E- 



Rauzy (2001) Method 



2dFGRS 

F^ 







12 14 16 18 20 22 24 



1214 16 1820 22 24 26 



12 14 16 18 20 22 24 



Limiting apparent magnitude m. 



Figure 4. Tc and r„ plots for 2dFGRS (left) and MGC (right). Here we apply the ROl method, where the rectangular regions in Figure 1 are allowed to grow 
to their maximum size when accounting for my^^ only. We can see in both panels that if one allows m* to pass fai' enough beyond the magnitude limit of the 
survey, the 'flat-lining' effect will eventually dominate, albeit for extremely negative values of Tc and Tv . 



associated with C,i by applying simple perturbation theory. In this 
case Equation 4 then becomes 



5Cr^ 



5ri{ni + 1) — ri5{ni + 1) 



To take into account the cross-terms we square Equation 6 to get, 



(6) 



{SOf 



5rt 



{n^ + 1)2 



+ 



C?[5(7i, + l)]V2 2CSni[S{n, + 1)] 



and 



(n, + 1)2 
{Ui + if 



(n, + 1)2 



ri{ni + 1) 



{sc^r 



Srf^[S{n, + l)]^ 2Sri[5{ni + l)]- 



(7) 



(8) 



By applying a similar approach for Tv we can obtain a similar ex- 
pression for the s/n associated with estimating n. Starting from 
Equation 5 we can show that, 



(Sn) 



{ti + 1)= 



qi(U + l) 



&q} [5(ti + l)]2 25g45(ii + l)] 



1/2 



(9) 



5 IMPLEMENTATION 

5.1 Establishing s/n Thresholds 

With our expressions for the s/n of C^i and t; we now explore the 
way in which the concept of an s/n threshold, beyond which the 
Tc and T^ statistics 'flat-line', may be integrated into our code for 
computing these statistics for a given survey. First we recall a fun- 
damental property of both estimators, for a given m*: by their con- 
struction, both Tc and Ti, should have a Gaussian sampling distribu- 
tion with mean zero and a variance equal to unity. We can therefore 
use the s/n expressions derived in the previous section to establish 
minimum s/n thresholds that will ensure the sampling distribution 
of both Tc and T^ is indeed Gaussian, with the correct mean and 



variance, for each ?n, - and particularly for fainter trial magnitudes 
closer to ?ny,-„. 

A procedure by which we can achieve this is illustrated in Fig- 
ure 5 and the discussion which follows. In all three plots we present 
the following: 

• Top panel: the Tc and T^ curves, shown as solid and dashed 
lines respectively, for a fixed, target value of 5Z and &M respec- 
tively. We also indicate the imposed bright and faint apparent mag- 
nitude limits, niy,^ and niy,^ respectively. 

• 2nd panel: here the achieved maximum (or peak) value of both 
&Z and (5 A/, at each ni*, is shown in green, while the mean value 
of 5Z and 5M is shown in blue. 

• 3rd panel: here we show, for each m*, the resulting peak s/n 
indicated by the green curve, while the mean s/n is indicated by the 
blue curve. In this case the solid lines represent the s/n for C,i whilst 
the dashed lines are for Ti . 

• 4th panel: here we show a histogram of the apparent magni- 
tude distribution for the survey under consideration. 

Let us consider the SDSS-ET survey, shown in the left-hand 
plot of Figure 5. Here we have applied the usual ITH07 method 
with a target width of iZ and <5Af = 0.015. We use the phrase 
'target width' as this version of the method seeks to maximise the 
sampling range of apparent magnitude within the JTH07 approach. 
Thus, we have allowed &Z and &M widths smaller than the tar- 
geted, fixed value to be included in the calculation. Therefore, it 
becomes clear that for the initial increments of jn, where the dis- 
tance between m» and rr^^^ is small (coupled with low numbers of 
galaxies), &Z and 5M will not reach the target width. 

In the 2nd panel of SDSS-ET, we show the resulting maximum 
value, as well as the mean value, of &Z and &M that was achieved 
for each m» . In particular the mean value curves are clearly seen 
to fall below the target width, as described above, for the initial 
increments of m,. 

The choice of this width was made so that the Tc and 
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Figure 5. Plots demonstrating how we can establish a minimum signal-to-noise (.v/n) threshold in SDSS-ET (left), MGC (middle) and 2dF (right) from the 
JTH07 approach. For each survey we show the imposed apparent magnitude limits, fnf;^ and m^^^, as vertical black dashed lines. In each case we choose a 
targeted, fixed value of 5Z and 5M (2nd panels) for Tc and T„ respectively (top panels) for which the resulting curves drop and 'flat-line' on or below the 
—3cr confidence limit of each test statistic. These widths coiTespond to SZ and SM=0.015 (SDSS-ET), 0.065 (MGC) and 0.02 (2dF). As these are our target 
values, the 2nd panels for each survey show the resulting maximum ('peak' shown in green) value of 5Z and 5AI that was achieved at each m,, as well as 
the mean values (shown in blue). To maximise the sampling range of apparent magnitude with the JTH07 method, we have included where necessary 5Z and 
SM widths smaller than the targeted, fixed value to be included in the calculation. Therefore, it becomes clear that for the initial increments of m» where the 
distance between rra* and niy^^ is small (coupled with low numbers of galaxies), 5Z and 5M will not reach the target width. The mean values of SZ and 5M 
(shown in blue in the 2nd panels) for all surveys clearly illustrate this effect. In the case of MGC, in particular, we observe initial increments of the peak curves 
indicating that for no galaxy are we able to construct a separable region with 5Z and 5M = 0.065. By choosing the 5Z and 5M widths to drop to the — 3cr 
limit beyond the fnfj^, we have, in effect, established a minimum s/n threshold that should ensure our estimators are not subject to the effects of very sparse 
sampling at brighter trial apparent magnitudes. We find this choice coiTesponds to a s/n ~ 12.3 for the SDSS data, ~ 14.0 for MGC and ~ 31.75 for the 2dF. 



Ti, curves drop to on or below their — 3(t confidence level at 
mfi„=17.45. In the case of SDSS-ET this value of SZ and SM cor- 
responds to a s/n level ^ 12.3 for both Tc and r„. For this survey, 
therefore, one would need to maintain a minimum s/n threshold 
~12.3 to ensure that the Tc and Tv statistics do not 'flat-line' due 
to very sparse sampling at magnitudes brighter than 7nfin,=17.45. 
We will explore further the consequences of this in the following 
section. 

In the remaining plots in Figure 5 we apply the correspond- 
ing procedure, with the same goal of ensuring that the 'flat-lining' 
behaviour occurs for sufficiently small values of the test statistics, 
to the MGC and 2dFGRS. For MGC in the middle plot, we require 
to set SZ and SM = 0. 065 and find a mean i/«~14.0 threshold. 
Finally, for 2dFGRS, we require to set SZ and SM = 0.02, which 
corresponds to a mean threshold of s/n ~ 31.75. 



5.2 Imposing the s/n Thresholds 

We can now use our pre-determined s/n levels, established in the 
previous section, and explore their impact on the Tc and T^ esti- 
mators. In Figure 6 we once again show the three surveys used for 
illustrative purposes in the same format as shown in Figure 5 and 
detailed in § 5.1. 

Let us first consider the MGC data shown in the middle plot. 
As a simplistic approach to implementing an s/n threshold we have 
decided to keep the average s/n constant throughout the sampling 
procedure. This is achieved by keeping constant the number of 
galaxies counted in 5*1052 (for Q and S3U54 (for r). For MGC, to 
achieve the minimum s/n level of ~14.0, already established from 
Figure 5, requires that the number of galaxies is equal to 150 in 
these combined regions. If we look at the 2nd panel for MGC we 
can observe the consequences for both SZ and SM as jn, increases 
towards the true magnitude limit, mf;^ of the survey. Initially, we 



see that SZ and SM are required to be rather large in size in order 
to achieve the minimum s/n level. This behaviour is expected and 
echoed by the histogram shown in the bottom panel of the plot. As 
the density of galaxies increases for fainter values of m, we see a 
sharp decline in the required width of SZ and SM to achieve the 
same s/n. We also note that imposing a minimum s/n level restricts 
the magnitude sampling range within which Tc and T„ can reliably 
test completeness, particularly for brighter apparent magnitudes, 
and effectively introduces a value of m* at which the test statistics 
'initialise'. In MGC, this initialisation occurs at an Tn«'-^17.6 mag. 

If we now turn our attention to SDSS-ET on the left plot of 
Figure 6 we can see that the distribution of galaxies on the M-Z 
plane is such that we do not throw away much information on bright 
end of the apparent magnitude range. Both Tc and T^ initialise at 
around m,~15.1, after which we see a similar, steep drop in SZ 
and SM as was apparent with MGC. To achieve the minimum s/n 
level of ~ 12.3 the number of galaxies to be counted in separable 
regions is required to be 130 galaxies. 

It is interesting to note that with the SDSS-ET survey, the Tc 
and Tv statistics initially fluctuate below — Sa between 15. 1< m, 
< 15.5. Similar behaviour is also observed with 2dF (see the right- 
hand plot). We recall that both SDSS-ET and 2dF surveys are well 
described by a bright and faint apparent magnitude limit, and as 
such are subject to natural restrictions of the maximum size of the 
(^ and T sample regions that retain the separability assumptions of 
the estimators - see § 3 for further clarification of this point. In 
our implementation of an s/n threshold to our code, we have in 
this instance, allowed SZ and SM to grow in size beyond the limit 
imposed at the bright end. Therefore, until SZ and SM narrow to 
a width that defines the separable region within the survey limits, 
the estimators will indicate incompleteness. As we have already 
discussed, MGC can be well described by a faint limit only and is 
therefore not adversely affected by large values of SZ and SM. 
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Figure 6. This figure shows the resulting T^ and T„ curves for all three surveys when we adopt a constant s/n level based on the approximate thresholds as 
established in Figure 5. The first distinguishing feature from that presented in Figure 5 is the deliberate omission of an imposed bright limit, m^^^. Secondly, 
in this simplified approach we can keep the average s/n constant by keeping constant the number of galaxies we count in the Si U S'2 (for Q and S^ U S'4 for 
(t). By doing this, however, of course we sacrifice information at the bright end of the luminosity function, where the survey is too sparsely sampled to be 
included in the calculation. This effect is particularly obvious in the Tc and Ty results for MGC (middle plot) and 2dFGRS (right plot) and is mirrored in their 
respective histogram distributions. However, the advantage of this approach is that we now have adaptive 5Z and 5M widths for the respective estimators 
where, as we increase in m, , the M-Z distribution is becoming more densely populated resulting in an almost asymptotic drop in the required widths to achieve 
the same s/n level. As already mentioned, we have omitted mfj,,,- This allows us to achieve the minimum s/n for a greater range in apparent magnitude, and 
therefore allows 5Z and &M to grow as large as required. Such large values are only evident for initial values of m* as show in the 2nd panels of each survey. 
For MGC (middle plot) this has no adverse effect on the respective Tc and T„ curves as MGC is equally well described by a single faint apparent magnitude 
limit, n^[jJJJ. However, we can see that for SDSS-ET and 2dF, the Tc and Ty statistics show initial fluctuations below — Scr which is to be expected since both 
surveys are defined with blight magnitdue limits. As 5Z and &M decreases in size with increased m,, so the estimators become less sensitive to presence of 



Finally, with the 2dF survey on right-hand panel of Figure 6, 
we have set the number of galaxies to 900 which seems to satisify 
our s/n criterion in our new scenario. As we have just discussed, 
there are slight fluctuations below — 3(t at bright values of m,, 
i.e. for 771, ~ 16.4 mag. These correspond to the adoption of large 
widths for 5Z and 5M. It should be noted that even with the min- 
imum s/n value, one can anticipate the true faint limit, mfi„,, of 
2dF being being identified as brighter than the published limit of 
mfi„=19.45 if one were to move to higher s/n levels. 



6 DISCUSSION AND CONCLUDING REMARKS 

In this article we have introduced a method which attempts to 
optimize the completeness estimators, suitable for application to 
double-truncated galaxy survey data, as previously developed by 
Johnston et al. (2007). Our new approach resembles an "adaptive 
smoothing" procedure which seeks to maintain a constant level of 
'information' - as characterised by the signal-to-noise ratio com- 
puted for our test statistics - allocated to each galaxy in the survey. 
In applying this methodology to three well understood and charac- 
terized surveys, we have demonstrated the importance of properly 
accounting for the impact of sparse sampling in each galaxy sur- 
vey. Furthermore, our results indicate that - without adopting such 
a procedure - the testing of magnitude completeness way be com- 
promised, and spurious values for the 'true' apparent magnitude 
limit(s) may be inferred. Thus, sparse sampling effects may im- 
pact adversely on previous applications of product-limit estimators 
which have been carried out in the literature to doubly-truncated 
data sets e.g. Efron and Petrosian (1999). 

The current article is the first of a two-part story. In the cur- 
rent paper we have set out to optimise our completeness estimators 



by imposing a lower limit on the number of galaxies contained in 
(and hence a lower limit on the width of) the rectangular regions 
we identify in the M-Z distribution of our data. This lower limit 
ensures that the Gaussian sampling distribution, with mean zero 
and variance unity, of our Tc and T^ statistics is preserved over the 
range of m* where the optimization is possible. In an upcoming 
publication (Johnston et al 2010, in preparation) we will consider 
in more detail the practical implementation of these optimised es- 
timators - and in particular how we may use them to assign error 
bars to Tc and Ti,, and hence to compute confidence limits for the 
faint apparent magnitude limit, niy,^, properly accounting for the 
correlations in Tc and r„ between negihbouring values of the trial 
magnitude limit vn, . 
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